Empirical Methods for MT Lexicon Development

نویسنده

  • I. Dan Melamed
چکیده

This article reviews some recently invented methods for au tomatically extracting translation lexicons from parallel texts The ac curacy of these methods has been signi cantly improved by exploiting known properties of parallel texts and of particular language pairs The state of the art has advanced to the point where translations can be found automatically and with high reliability even for non compositional com pound phrases that are not translated word for word Crucially all of these methods can be smoothly integrated into the usual work ow of MT system developers Partial automation of MT lexicon construction is likely to produce more accurate results more e ciently

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reuse of linguistic resources in MT

Machine translation (MT) draws more heavily on lexical resources than most other NLP applications. First, grammars of both source and target languages require lexicons. Second, some sort of mapping between lexicons is required in order to transfer information from a source to a target language. The MT system described here is based on Shake-and-Bake technology and uses lexical transfer as the i...

متن کامل

An Empirical Architecture for Verb Subcategorization Frame - a Lexicon for a Real-world Scale Japanese-English Interlingual MT

The verb subcategorization frame information plays a major role of disambiguations in many NLP applications. Japanese, however, imposes difficulties of subcategorizing in part because it allows arbitrary ellipses of case elements. We propose a new type of verb subcategorization frame code set that combines the verb's surface case set and the deep case set, as a solution to the difficulties of e...

متن کامل

An Empirical Architecture for Verb Subcategorization Frame - a Lexicon for a Real-world Scale Japanese-English Interlingual MT

The verb subcategorization frame information plays a major role of disambiguations in many NLP applications. Japanese, however, imposes difficulties of subcategorizing in part because it allows arbitrary ellipses of case elements. We propose a new type of verb subcategorization frame code set that combines the verb's surface case set and the deep case set, as a solution to the difficulties of e...

متن کامل

Benchmarking Machine Translated Sentiment Analysis for Arabic Tweets

Traditional approaches to Sentiment Analysis (SA) rely on large annotated data sets or wide-coverage sentiment lexica, and as such often perform poorly on under-resourced languages. This paper presents empirical evidence of an efficient SA approach using freely available machine translation (MT) systems to translate Arabic tweets to English, which we then label for sentiment using a state-of-th...

متن کامل

FUDR-based MT, head switching and the lexicon

We present an MT-approach which does transfer at the level of flat underspecified discourse representation structures. It allows for natural definitions of notoriously difficult structural divergencies between source and target, like head switching, by exploiting the formal means of semantic scope. The corresponding expressive lexicon formalism allows for a lexically driven, co-descriptive tran...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998